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Abstract. Some of the most important results in prediction theory and time 
series analysis when finitely many values are removed from or added to its 
infinite past have been obtained using difficult and diverse techniques ranging 
from duality in Hilbert spaces of analytic functions (Nakazi, 1984) to linear 
regression in statistics (Box and Tiao, 1975). We unify these results via a finite- 
dimensional duality lemma and elementary ideas from the linear algebra. The 
approach reveals the inherent finite-dimensional character of many difficult 
prediction problems, the role of duality and biorthogonality for a finite set 
of random variables. The lemma is particularly useful when the number of 
missing values is small, like one or two, as in the case of Kolmogorov and 
Nakazi prediction problems. The stationarity of the underlying process is 
not a requirement. It opens up the possibility of extending such results to 
nonstationary processes. 



1. Introduction 

Irregular observations, missing values and outliers are common in time series 
data (Box and Tiao (1975), Brubacher and Wilson (1976)). A framework for dealing 
with such anomalies is that of X = {X t }tez being a C- valued, mean-zero, weakly 
stationary stochastic process with the autocovariance function 7 — {jk}k<£Z and the 
spectral density function /: E[X k Xi] = y k -i = ^ir)- 1 J^e^ k ^ x f(X)dX. Then, 
the problem can be formulated as that of predicting or approximating an unknown 
value Xq based on the observed values {X t ; t £ S} for a given index set S C Z\ {0} 
and the knowledge of the autocovariance of the process. Such a problem is quite 
important to applications in business, economics, engineering, physical and natural 
sciences etc., and belongs to the area of prediction theory of stationary stochastic 
processes developed by Wiener (1949) and Kolmogorov (1941) (see also Pourahmadi 
(2001)). By restricting attention to linear predictors and using the least-squares 
criterion to assess the goodness of predictors, a successful solution seeks to address 
the following two goals: 

(Pi) Express the linear least-squares predictor of Xq, denoted by X (S), and 
the prediction error Xq — Xq(S) in terms of the observable {X t ; t £ S}. 

(P 2 ) Express the prediction error variance a 2 (S) = <J 2 (f, S) := E\X - X (S)\ 2 
in terms of /. 
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The link between solutions of finite and infinite past prediction problems serves 
as a natural bridge between time series analysis and prediction theory. From the 
dawn of modern time series analysis, the works of Slutsky and Yule in the 1920's 
and Wold in the 1930's have been instrumental in achieving the goal (Pi) in the 
time-domain using the finite past. Subsequently, the classes of autoregressive (AR), 
moving-average (MA) and mixed autoregressive and moving- aver age (ARMA) mod- 
els have played major roles in the development of time-domain techniques using 
the autocovariance function of the process (see Box et al. (1994)). Nowadays, these 
techniques are implemented by solving the Yule- Walker equations via the celebrated 
Durbin-Levinson algorithm and the innovation algorithm (see Brockwell and Davis 
(1991)). On the other hand, the spectral-domain techniques in prediction of sta- 
tionary processes, advocated by Kolmogorov and Wiener in the early 1940's, rely on 
the spectral representations of the process and its covariance (Kolmogorov (1941), 
Wiener (1949), Pourahmadi (2001)). 

The focus in prediction theory is more on the goal (Pa)- The celebrated Szego- 
Kolmogorov- Wiener theorem gives the variance of the one-step ahead prediction 
error based on the infinite past indexed by the "half- line" Sq := {...,— 2,-1} by 

(1.1) a 2 (f,S )=ex 1? (J- J* log f(\)d\) >0 

if log / is integrable, and otherwise a 2 (So) — 0. However, when the first n consecu- 
tive integers are removed from Sq or for the index set SL„ := {. . . , — n — 2, — n — 1}, 
n > 0, the formula for the (n + l)-step prediction error variance (Wold (1938), 
Kolmogorov (1941)) is 

(1.2) a 2 (f,S- n ) = \b \ 2 + \b 1 \ 2 + --- + \b n \ 2 , n = 0,l,..., 

where {bj}, the MA coefficients of the process, is related to the Fourier coefficients 
of log/ and |6o| 2 = a 2 (So) (see Nakazi and Takahashi (1980) and Pourahmadi 
(1984); see also Section [3] below) . 

A result similar to (|1.1| for the interpolation of a single missing value corre- 
sponding to the index set Soo := Z \ {0} was obtained by Kolmogorov (1941). 
Specifically, the interpolation error variance is given by 

(1-3) <x 2 (/,Soo) = f fW~ 1 d\J > 

if/ -1 6 L 1 := 7r, 7r], dX/(2n)), and otherwise ct 2 (5oo) = 0. The corresponding- 

prediction problem for the smaller index set S n :— {. . . , n — 1, n} \ {0}, n > 0, was 
stated as open in Rozanov (1967, p. 107) and is perhaps one of the most challenging 
problems in prediction theory next to (|1.1|) . The index set S n is, indeed, of special 
interest as it forms a bridge connecting Sq and Sao", it reduces to So when n = 
and tends to S x as n — > oo. In a remarkable paper in 1984, Nakazi using delicate, 
but complicated analytical techniques (and assuming that / _1 S L 1 ) showed that 

(1.4) <r 2 (f,S„)= (|a | 2 + |ai| 2 + --- + K| 2 r 1 , n = 0,l,..., 

where {a,} is related to the AR parameters of the process (see Section 3 below). 



From (jl.2p and (|1.4p . the question naturally arises as why there is such an 
"inverse-dual" relationship between them. In this regard, it is worth noting that 
Nakazi's technique, if interpreted properly, amounts to reducing computation of 
er 2 (/, S n ) to that of the (n+ l)-step prediction error variance of another stationary 
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process {Y t } with the spectral density function / _1 which turns out to be the 
dual of {X t } (see Definition 2.1 and Section 3.5). His result and technique have 
spawned considerable research in this area in the last two decades; see Miamee and 
Pourahmadi (1988), Miamee (1993), Cheng et al. (1998), Frank and Klotz (2002), 
Klotz and Riedel (2002) and Bondon (2002). A unifying feature of most of the 
known results thus far seems to be a fundamental duality principle (Cheng et al. 
(1998), Urbanik (2000)) of the form 

(1.5) <r 2 (f,S)-<j 2 (f-\S c ) = l, 

where S c is the complement of S in Z \ {0} and / _1 £ L 1 . The first occurrence 
of (|1.5|) seems to be in the 1949 Russian version of Yaglom (1963) for the case 
of deleting finitely many points from Soq. Proof of (|1.5|) . in general, like those of 
the main results in Nakazi (1984), Miamee and Pourahmadi (1988), Cheng et al. 
(1998), and Urbanik (2000), is long, unintuitive and relies on duality techniques 
from functional and harmonic analysis and requires f^ 1 G L 1 which is not natural 
for an index set like S n . Surprisingly, a version of (|1.5|l in a rather disguised form 
was developed in Grenander and Rosenblatt (1954, Theorem 1), as the limit of a 
quadratic form involving Szego's orthogonal polynomials on the unit circle, see also 
Simon (2005, p. 165). Unfortunately, it had remained dormant and not used in the 
context of prediction theory, except in Pourahmadi (1993). 

In this paper, we establish a finite-dimensional duality principle (Lemma 12. 4|) . 
which encapsulates (II. 5|) in a transparent and useful manner. The concept of dual 
of a random vector plays a central role as does the Cholesky decomposition of its 
covariance matrix. We use this duality principle to unify and solve some prediction 
problems related to removing a finite number of indices from S n and S^. The 
outline of the paper is as follows. In Section [2j we present the main lemma, some 
auxiliary facts about dual of a random vector and their consequences for computing 
the prediction error variances and predictors. In Section^ using the lemma we first 
solve three finite prediction problems for X$ based on the knowledge of {X t ; t 6 K} 
with K = {— to, . . . , n) \ (MU{0}), m, n > 0, where M, the index set of the missing 
values, is relatively small. Then we obtain the solutions of Kolmogorov, Nakazi, 
and Yaglom's prediction problems in a unified manner by studying the limit of the 
solutions by letting m — > oo, followed by n — > oo. In particular, we find explicit 
formula for the dual of the process {X t ; t < n} for a fixed n, which does not seem 
to be possible using the technique of Urbanik (2000), Klotz and Riedel (2002) and 
Frank and Klotz (2002). This is useful in developing series representations for 
predictors and interpolators, and sheds light on the approaches of Bondon (2002) 
and Salehi (1979). In Section^ we close the paper with some discussions. 

Finally, we should point out that the two simple formulas (|1.2[> and f| 1 .4|) and 
their extensions provide explicit and informative expressions for the prediction er- 
ror variances. Like their predecessors (jl.ip and (|1.3|) . they serve as yardsticks to 
assess the impact (worth) of observations in predicting Xq when they are added 
to or deleted from the infinite past and highlight the role of the autoregressive 
and moving-average parameters for this purpose; see Pourahmadi and Soofi (2000). 
In fact, Bondon (2002, Theorem 3.3; 2005) shows that a finite number of missing 
values do not affect the prediction of Xq if and only if the AR parameters corre- 
sponding to the indices of those missing values are zero. Furthermore, the examples 
in Section [3] indicate how the interpolators of the missing values can be computed 
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rigorously without resorting to formal derivations (Box and Tiao (1975), Brubacher 
and Wilson (1976) and Budinsky (1989)). 



2. A Finite-Dimensional Duality Principle 

In this section, an elementary result is stated as a finite-dimensional duality 
lemma, which we use in Section 3 to solve and unify various challenging prediction 
problems through the limit of the solutions of their finite past counterparts. 

For a finite index set N, let Hn be the class of vectors X = (Xj)j e N of random 
variables with zero-mean and finite variance on a probability space (O, T, P): 

H N := {X = (Xj) jeN ; X j G L 2 (n,T,P), E[Xj] = 0, j G N}. 

As usual, we consider the inner product (Y, Z) := E[YZ] and norm ||y|| := 
£[|F| 2 ] 1/2 for random variables in L 2 {tt,T,P). 

Definition 2.1. Let N be a finite index set and X G Hm- A random vector 
Y G Hn is called the dual of X if it satisfies the following conditions: 

(i) The components Yj, j G N, belong to sp{Xk] k G N}. 

(ii) X and Y arc biorthogonal: (X i} Yj) = 5ij for i,j G N, or Cov(X, Y) = I. 

For X G H Nl I G N and K G N, we write Xi(K) for the linear least squares 
predictor of Xi based on {Xk\k G K}, i.e., the orthogonal projection of Xi onto 
sp{Xk;k G K}. For the sake of completeness and ease of reference, in the next 
two propositions we summarize the characterization, interpretation and other basic 
information about the dual of a random vector in terms of its covariancc matrix 
and certain prediction errors. 

Proposition 2.2. Let N be a finite index set and X G -ffjv- Then, the following 
conditions are equivalent: 

(1) The components Xj, j G N, of X are linearly independent. 

(2) The covariance matrix T = (jij)ijeN of X with 7^- = (Xi,Xj) is nonsin- 
gular. 

(3) X is minimal: Xj £ sp{Xj; i G N, i ^ j} for j G N. 

(4) X has a dual. 

Proof. Clearly, (l)-(3) are equivalent. Assume (3) and define Y = (Yj)j^M G Hm 
by Yj = (X 3 - Xj(Nj))/\\Xj - Xj(Nj)\\ 2 , where Nj := N \ {j}. Then Y 3 belongs 
to sp{Xfc; k G N}, and (X i} Yj) = dij holds: 

(x Y)= {Xj,Xj - Xj(Nj)) = [X 3 -X ] {Nj),X J -X 3 {Nj)) = i 
" \\Xj-Xj{Nj)f ||^-^(A,)|| 2 

and for i ^ j, 

(X i ,Y j )=^-M N ^=0. 
WXj-XjiNj^ 

Thus Y is a dual of X, and hence (4). Conversely assume (4) and let Y be a dual of 
X. If X is not minimal, then there exists j G N such that Xj G sp{Xi; i G N, i ^ 
j}, that is, Xj = ^2i^j CiXi for some Cj G C, and, since (Xi,Yj) = for i ^ j, 
we have (Xj,Yj) — Y^^j Ci(Xi,Yj) = 0. However, this contradicts (Xj,Yj) = 1. 
Thus, X is minimal, and (3) follows. □ 
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The proof reveals the importance of the "standardized" interpolation errors of 
components of X in defining its dual. More explicit representations and other 
properties of the dual are given next. 

Proposition 2.3. For a finite index set N , let X G with covariance matrix T. 
Assume that X has a dual Y . Then the following assertions hold: 

(1) The dual Y is unique. 

(2) The dual Y is given by Y j = (Xj - Xj(Nj))/\\Xj - X^N^f with Nj := 
N\{j}forjeN. 

(3) The dual Y is also given byY = T~ 1 X or Y{ = X^jew 7 l,J -^jj * S N, where 

r- 1 = ( 7 ^) iiieJV . 

(4) The covariance matrix of Y is equal to T 1 . 

(5) The dual ofY is X. 

(6) S p{X j ;j€N} = sp{Y j ;jeN}. 

Proof. First, we prove (1). Let Z be another dual of X and j € N be fixed. Then 
(Xi, Yj - Zj) =0 for all i G N. However, since Yj - Zj G sp{X k ; k 6 N}, it follows 
that Yj = Zj and hence (1). (2) follows from the proof of Proposition ^. 21 To prove 
(3) and (4), we put Y = T^X. Then Yj £ sp{X k ; k £ N}. Since T" 1 is Hermitian, 
we have 

Cov(X, Y) = Cov(X, X) T- 1 = TV' 1 = I, 

Cov(y,y) = r^ 1 Cov(x,x)r^ 1 = r^rr" 1 = r -1 . 

Thus (3) and (4) follow. Finally, we obtain (5) and (6) from (3) and (4). □ 

From the two representations in Proposition ^. 31 (2), (3) for the dual Y, we find 
the following representation for the standardized interpolation error: 



Xl X '^ Nt) x ,f^Xj with N t = N\{i}. 



\\x. t - x^W 2 

In particular, r f' 1 — l/\\Xi — Xi(Ni)\\ 2 . Notice that these equalities hold even if F 
is not a Toeplitz matrix or X is not a segment of a stationary process. For some 
statistical/physical interpretations of the entries of T -1 , the inverse of a stationary 
covariance matrix, see Bhansali (1990) and references therein. 
Now, we are ready to state the main duality lemma. 

Lemma 2.4. Let N be a finite index set. Assume that X G has the dual 
Y G Hjq and that K , M and a singleton {/} partition N , i.e., 

N = K U {/} U M (disjoint union). 

Then the following equalities hold: 

(a) Yl - fl{M) ■ 

||y ; -yK^)H 2 

(b) 11^-^(^)11= ' 



\\Yl-Yi{M)\\ 



Proof. Since X and Y are minimal and biorthogonal, Xi — Xi(K) and Yi — Yi(M) 
are nonzero and belong to the same one-dimensional space, that is, the orthogonal 
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complement of sp{X 6 K} © sp{Y}; j e M} in sp{X,;j E N}. Therefore, one is 
a multiple of the other; for some c e C, 



But, since c is equal to 
(Y l -Ym,Y l -Ym) = jY l -Y l (M),Y l)= ^.^^ = ^ = ^ 

we get (a) and (b) and hence the lemma. □ 

In the applications in Section 3, we use this duality in the form of the next 
lemma which gives a way of computing the predictor coefficients and prediction 
error variance using the inverse matrix = (7*'-' ). 

Lemma 2.5. Let N, X = (Xj)j e N, Y = {Yj)jeN, K, M and {1} be as in Lemma 
with r — (jij)ijeN the covariance matrix of X and T^ 1 = (Y'^i.jeN ■ Then 



(2.1) Xi - XAK) = V afc, 

(2-2) \\X t - Xi(K)\\ 2 = a' h 

where (o^)j gA: fu{z} is the solution to the following system of linear equations: 

(2.3) E ieMU{l} ^' j = S «> ieMU W . 

In particular, the prediction error variance af(K) = \\Xi — Xi(K)\\ 2 is given by 

(2.4) crf(K) = the (l,l)-entry of the inverse o/(7 1J )i.3gMu{i}i 
and the predictor coefficients a k in Xi(K) — ^2 keK a k X k are given by 

whence we have 

Proof. Since Yj's are linearly independent, Lemma \2. 41 fa) shows that X\ — Xi(K) 
is uniquely expressed in the form (|2.1[) . Then a[ = ||Yj — Yj(Af)||~ 2 , which, in view 
of Lemma CLl(b), is equal to \\X t - Xi(K)f, and (2J2|) holds. Since (X^Yj) = % 
and (Yi,Yj) = 7 JJ , the predictor coefficients a k in Xi(K) = J2keK a kX k satisfy 

a k = (XAK), Y k ) = ( Xi - V a'iYi, Y k ) = - V a' 7 4 ' fc . 

Thus ([23]) . whence (J2HD- Similarly, for j E M U {/}, we have (X t (K), Y,) = and 

Therefore, (22]) follows. Finally, we obtain (220) from (22]) and (pO]) . □ 
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Recall that the predictor coefficients a k = ak.i(K) in Xi(K) — Y^keK a kXk 
and the prediction error variance a 2 = <J 2 {K) = — A/(A")|| 2 are traditionally 
computed from ( r Yij)i t j£Ku{l} by solving the normal equations: 



(2.7) 



Alternatively, one could write the above as an analogue of the Yule- Walker equa- 
tions: 

(2-8) Tij - Y, keK a ^,j = S u a 2 , jeKU {I}. 

Then a 2 = <rf(K) can be identified as 

(2.9) af(K) = [the {I, Z)-entry of the inverse of (7i,i)i,j'eAru{i}] 1 ■ 



In addition, using the Cramer's rule, one may write a 2 in (12. 9ft as the ratio of the 
two relevant determinants: a 2 = det(7i j j)^ ) j e jfu//j./d.et(7i ) _j)i j3 -£if. 

In spite of the simplicity of (|2.7|) - (|2.9|) . they are not convenient for the study of 
the asymptotic behaviors of the predictor coefficients and predictor variance as K 
gets large. The method of computation in Lemma 12.51 becomes particularly useful 
when K is large but M is small (see Section 3.2 below). 



3. Applications to Prediction Problems 

In this section, we illustrate the role of the finite duality principle (Lemmas 
12.41 and 12. 5p in unifying some diverse prediction problems for a zero- mean, weakly 
stationary process {Xj}j e % with the autocovariance function 7 = {jj}j^z- Ji-j = 
(Xi, Xj ). 

For simplicity, we assume that {Xj}j e % is purely nondeterministic, so it admits 
the MA representation (Wold decomposition) 

(3.1) A\ V' 6 J'-* e *> ^' eZ < 

z — *k= — 00 

where {£j}jez is the normalized innovation of defined by 

£j := {Xj - Xj({. ..,j-2,,j- 1})}/\\X, - Xj({. . . , j - 2, , j - 1})\\, 3 6 Z, 

and {bk\kLa IS the MA coefficients given by b k := (X ,e-k)- We define a sequence 
of complex numbers {a/cj^g by the relation 

(3.2) Y^ k=Q b k a j-k = Soj, j > 0. 

If the series Y^jLo a i^-i * s mean- convergent, then (|3.I[) can inverted as 

(3.3) £j = V" aj^kXk, j 6 Z. 

* — ^ k— — oo 

This is essentially the same as the AR representation (see Pourahmadi (2001)), and 
we call {a k } the AR coefficients of {Xj}j G z- As suggested in (|I.2p and (|I.4[) . these 
{&it} and {a,fc} play an important role in prediction problems. 
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3.1. Finite Prediction Problems with Missing Values. Let M be a finite 
set of integers that does not contain zero. Throughout this section, it represents 
the index set of missing (unknown) values when predicting Xq- For given M, 
we take the integers m,n > so large that M C N := { — to, ...,n}, and put 
K = N \ (M U {0}), which represents the index set of the observed values, so that 
we have the partition N — K U {0} U M as in Lemma 12.41 We start with the 
prediction problem for a finite index set K . Once the problem is solved for such a 
K , the solutions for infinite index sets S n \M and Soo \ M are obtained by taking 
the limit of the solutions, first asm^oo, and then n — > oo. 

Traditionally, the coefficients of the finite linear predictor Xq{K) and its predic- 
tion error variance <J 2 (K) — \\Xo—Xo(K) || 2 are expressed in terms of the covariance 
function 7, using the normal equations (|2.7j) . However, the results so obtained are 
not convenient for studying the asymptotic behaviors of the predictor coefficients 
as to — > 00 and/or n — > 00. The problem can be made much simpler by the fi- 
nite duality principle and some fundamental facts about the finite MA and AR 
representations, as we explain now (see also Pourahmadi (2001)). 

For the future segment {Xj}j°^ of the process, we define its normalized innova- 
tion {ej,o}j^Lo by the Gram-Schmidt method: £0,0 : = -Xo/H-Xo II an( l 

e jfi ~ {Xj - x : ({o, . . . , j - lyywxj - x,-({o, . .. , j - i})\\, j > 1. 

Then {Xj} and {e^.o} admit the following finite MA and AR representations: 

Xj = ^2 k=Q bj-k,j£k,o, £j,o = ^2 k=0 a j-k,jX k , j > 0. 
Here {/</,., H , : is defined by b k j ■= (Xj,£j-k,o) and {a k ,j}{ =0 by 

E, .bj-k,ja>k-i,k = Sij or V" .aj-k,jbk-%,k = i<j- 
K—i * — 4 k—i 

These finite MA and AR coefficients converge to their infinite counterparts: 

(3.4) lim bk,j = bk, lim ak,j = cik- 

j — >oo j — ^00 

If we consider {Xj}J^_ m instead of {Xj}JL , then by stationarity, it follows that 

(3.5) Xj = bj-k,m+j £ k,-mi £ j,-rn = aj-k.m+jXk, j > —TO, 

* — 'k— — rn *• — 'k= — m 

where {e Ji _ m }°^„ m is the normalized innovation of {Xj}j±_ m defined in the same 
way. We notice that 

(3.6) £j = lim £j.- m , j £ 

m — ^00 

Thus, the representations in (|3.5[) reduce to (|3.ip and (|3.3p as m — > 00. 

Recall that N = {—to, . . . , n} and let X be the vector (Xj)jgjv with covariance 
matrix T — {"fi-j)ijeN ■ From Proposition ^. 31 (3). its dual Y is given by Y = T~ 1 X. 
Let e be the normalized innovation vector of X, i.e., e := (sj,- m )j£N. Then it 
follows from (|3.5p that 

X = Be, e = AX, 

where A and B are the lower triangular matrices with (i, j')-entries ai-j, m +i and 
bi-j, m +i for —to < j < % < u, respectively. Since A = B^ 1 and Y = BB* , we have 

T" 1 = A* A, Y = A*e. 
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Thus, the (i, j)-entry 7* J of T 1 and the j'-th entry Yj of Y have the representations 

(3.7) 7 lJ = . Q-k~i,m+kQ'k-j,m+ki Yj = / y , , Qfc-j,m+fc £ fc,-mi 

which are certainly more conducive to studying their limits as first m — > 00 and 
then n — > 00, see (|3.4j) and (|3.6p . 

Now, we are ready to express the predictor X (K), the prediction error X — 
Xq(K) and its variance u 2 (K) as prescribed by Lemma |2~5"1 In particular, it follows 
from (E3|), (HU) and ([S3]) that 
(3.8) 

a 2 (K) = the (0,0)-entry of the inverse of (7 a fc _i m+k ak- 3 , m +k ) 

V*— 'fc=*Vj /ij£MU{0} 

and 

(3.9) A - = E ieMU{0} °4 (EL a*-*.^*, 

where a'^s are as in Lemma 12.51 with I = 0. 

To highlight some far-reaching consequences of (|3.8[) and (|3.9[) . a few special cases 
corresponding to the classical prediction problems of Kolmogorov (1941), Yaglom 
(1963) and Nakazi (1984) are singled out and listed as examples in the next section 
according to the cardinality of the index set M of the missing values. 

3.2. Examples. In this section, we discuss three distinct examples of the use of 
the finite duality principle and illustrate the process of obtaining results for the two 
infinite index sets S — S n \ M and S = Soo \ M. 

Since {Xj}j e % is purely nondeterministic, it has the spectral density function / 
with log / E L 1 : — (27r) _1 f_ e~ % i x f(X)d\. Also, there exists an outer function 
h in the Hardy class H 2 such that / = \h\ 2 and h(0) > 0, and we have 

(3.10) ^) = EI>^ -^ = Y,Zo akzk 

in the unit disc. This shows that / _1 E L 1 if and only if {a k } is square summable. 
Using (|3.10p . which should be compared with (|3.ip - (|3.3p . we can define the MA 
and AR coefficients in an analytical way. 

Example 3.1 (The Finite Kolmogorov-Nakazi Problem). This is a finite interpo- 
lation problem corresponding to K = {— to, . . . ,n} \ {0} and M = cf> (empty set), 
and the solution of (|2.3[) is a' = I/7 ' . Consequently, from (|3.T|) — (|3.9[) . we have 

(3.11) ^w = (El =0 i afc .™+ fe i 2 ) _1 

and 

(3.12) X Q - X (K) = (J2l =0 W k , m+k \ 2 Y EL 

Next, we show that (|3.1ip and (|3.12p are precursors of important results in 
prediction theory due to Kolmogorov (1941), Masani (1960), and Nakazi (1984). 

The result (fO]) of Nakazi (1984) for S n = {. . . , n - 1, n} \ {0} is obtained by 
taking the limit of (|3.1ip as to — > 00 (without assuming / _1 E L 1 ). Indeed, by 
(O, we see that (|3~TTj) gives 

(3.13) ^(Sn)=(ELo |a *^ 
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Also, in view of (I3.6p , it follows from (|3.12p that 

(3-14) X -X (S n )= (E fe=0 l a *l) E fe=0 5fe£fc - 

The solution (| 1 . 3[) of the Kolmogorov (1941) interpolation problem with Soo = 
Z \ {0} follows from (|3 . 1 3|) by taking the limit as n — > oo, provided that {a k } is 
square summable. Thus, as in Kolmogorov (1941), assuming that {X t } is minimal 
or / _1 G L 1 , we obtain 

- 2 (^) = (ELKi 2 )" = (i//W rfAX " 

Under the same minimality condition, the limit of (|3.14[) as n — > oo, leads to 



X - Xo(5oo) = ^ fe=Q \a k \') 2^ k ^ k e k , 

which is Masani's (1960) representation of the two-sided innovation of {Xj} at 
time 0. It is instructive to note that this is a moving average in terms of the future 
innovations. In fact, the source of such moving average representation can be traced 
to (|3.7[) and (|3.14|) . A version of (|3.14|) seems to have appeared first in Box and 
Tiao (1975) in the context of intervention analysis; see Pourahmadi (1989), and 
Pourahmadi (2001, Section 8.4) for a more rigorous derivation, detailed discussion 
and connection with outlier detection. 

Our second example corresponds to M having cardinality one and hence involves 
inversion of 2 x 2 matrices, no matter how large K is. 

Example 3.2 (The Finite Past with a Single Missing Value). This problem cor- 
responds to m > 0, n = 0, K = {— m, . . . , —1} \ {— u} and M = {—it}, where 
1 < u < to, so that X_ u from the finite past of length m is missing. By (|3. 71) . the 
2x2 matrix for solving (|2.3p is 

7~" ,-tl 7~"'° \ = f Efc=o \ a u-k., m -k\ 2 a , m a u , m 
7°~" 7°'° J \ % m fl«,m |ao,m| 2 



Hence, using the subscript to to emphasize the dependence on to, we have 

Oi 0,m — "T / i t, n \ au ~ k ^ n - k \ ' a —u,m ~ T ! 



with the determinant A m — loso.rol Eit=] \ a u— k.m— k 

| 2 . Thus, by J3H) and (|3l)j) . 

-k,7n — k | 



(3.15) < ko,m| 2 ELi K-* 

I Ao — Xo(i^) = CCq m ao, m £o,-m + a '- M m / , „ ®u-k,m-k e -k,-mi 
v ' ' * k— 

and, taking the limit as m — > oo, 

( CT 2 (5o\{-.}) = |6 | 2 p^, 

(3.16) < Lfc=okr 

Xq - X q (Sq \ {-u}) = a ao£o + ol'_ u } a u - k S-k, 

\ z ' ft! — 

where a' a and a'_ u are the limits of a' m and o/_ M m , as to — > oo, respectively. 

The expressions in (|3.16p were obtained first in Pourahmadi (1992); see also 
Pourahmadi and Soon (2000) and Pourahmadi (2001, Section 8.3). However, those 
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in (|3.15p have not appeared before. For n > 0, slightly more general calculations 
leading to analogues of (|3.15|) and (|3.16p can be used to show that the inverse 
autocorrelation function of {X t } at lag u is the negative of the partial correlation 
between Yo and X u after elimination of the effects of X t , t ^ 0, u, as shown in 
Kanto (1984) for processes with strictly positive spectral density functions. 

Example 3.3 (The Finite Yaglom Problem). There are many situations where 
the cardinality of M is two or more; see Pourahmadi et al. (2007), Box and Tiao 
(1975), Brubacher and Wilson (1976), Damsleth (1980), Abraham (1981). In the 
literature of time series analysis, there are several ad hoc methods for interpolating 
the missing values. For example, Brubacher and Wilson (1976) minimize 

E_ m e ) = E_ m (EL.,,, a i-k x k) 

with respect to the unknown Xj, j G M U {0}, and then study the solution of the 
normal equations as m,n ^ oo. Budinsky (1989) has shown that this approach 
under some conditions gives the same result as the more rigorous approach of 
Yaglom (1963). In applying Lemma \2. 51 to this problem, we first note that, due to 
the large cardinality of M, handling (|3.8p and (I3.0|) via (|2.3p does not lead to simple 
explicit formulas as in (|3 . 1 5|) and (|3.16p . Nevertheless, the limits of the expressions 
in (|3.8[) and (|3.9p as first m — > oo, and then as n — > oo (assuming / _1 G L 1 ) have 
simple forms in terms of the AR parameters: 
(3.17) 

a 2 (S) = the (0, 0)-entry of the inverse of &k-i o-k-i ) 

Now, using (|3.10p and writing the entries of the above matrix, in terms of the 
Fourier coefficients of it follows that (|3 . 1 T[) reduces to the results in Yaglom 
(1963); see also Salehi (1979). 

3.3. The Infinite Past and the Wold Decomposition. A more direct method 
of solving prediction problems for S — S n \ M is to reduce them to a different class 
of finite prediction problems than those in Section 3.2. This is done by using the 
Wold decomposition of a purely nondeterministic stationary process. 

As in Section 3.1, write N = {— m, . . . , n} and N = K U {0} U M (disjoint), so 
that S = S n \M = {..., -m - 2, -m -l}l)K (disjoint). For j > -m, let Xj be 
the linear least-squares predictor of Xj based on the infinite past {Y^; k < — m). 
Then, by (EU, 

which are orthogonal to Wp{Xj-,j < — m}, and it follows that 

sp{X,-; j eS} = sp{Xj - Xj;j e K} ®sp{Xj- j < -m}. 

This equality plays the key role in finding the predictor of Yo and its prediction 
error variance, based on {Xj;j G S}. In fact, by using it, we only have to solve 
the problem of predicting Yo — Yo based on {Xj — Xj\j G K}. More precisely, we 
consider X' :— (Xj — Xj)j<=N which has the covariance matrix G = (gi,j)i,jeN with 



(see Pourahmadi (2001, p. 273)). Then, writing X a = X + (X — X ), we get 
X (S)=X + Y, L * k {X k -X k ), 

(3.18) 

a 2 (S)= (X -X )-Y /keK MX k -X k 

where J2keK a k(X k — X k ) is the predictor of X — X based on {X k — X k ; k 6 K}, 
and the predictor coefficients a k and prediction error variance cr 2 (S) are obtained 
from the normal equations (|2.7|) with jij replaced by gij; in particular, by (|2.9|1 . 

T 1 _1 

a (S) = the (0, 0)-entry of the inverse of I y, bi- k bj- k I 

Y k=-m J i,jeK\j{0}_ 

We can also apply Lemma [231 to the above finite prediction problem for X' . In 
so doing, the following representations for the (z, j)-entry g l ' J of G" 1 and the j-th 
entry Yj of the dual Y of X 1 are available: 

En ^ — T n 
a k -ia k -j, Yj = ) a k -jS k . 

In fact, these are obtained by using (|3.2p and Proposition 12.31 (3) or by letting 
m — > oo in (|3 . T[) . The explicit representations in (|3.19[) are also important in 
finding series representations for predictors and interpolators discussed in the next 
two subsections. 

3.4. Series Representation of the Predictors. The Wold decomposition (|3.1|) 
is often used to express predictors and prediction errors in terms of the innovation 
process {e*}. This strategy works well for achieving the goal (P2) in Section 1, but 
since the innovation s t is not directly observable the resulting predictor formulas 
are not suitable for computation. To get around this difficulty, one must express 
the innovations or the predictors in terms of the past observations. In this section, 
we obtain series representations for the infinite past predictors in terms of the 
observed values. A novelty of our approach is its reliance on the representation 
of the prediction error in terms of the dual Y in (|3 . 1 9|) . hence the solution of the 
problem (Pi) for S — S n \ M is more direct and simpler than the procedures of 
Bondon (2002, Theorem 3.1) and Nikfar (2006). 

Assuming that {Xj}j<=z has the mean-convergent AR representation 
follows from (f3TTg]> with S = {. . . , -m - 2, -m - 1} U K that 



*o(S) = J2 keK akXk + S i=1 \h< m ~ ^2 k eK a khm+k) X 



-m—J ; 



where fj. k :— — ^<=o ^k-iO-j+i is the coefficient of the (k + l)-step ahead predictor 
based on the infinite past So = {• • • , —2, —1}, i.e., X k (So) = Y^jLi fj,kX-j f° r 
k = 0, 1, . . . . On the other hand, from the finite duality principle or, more precisely, 
(gUP with (|3T9)) . we have 

X (S) =X - J2 ieMU{0} < (EL a *-< e * 



From this, replacing e k from (|3.3j) and after some algebra, we get the following 
alternative series representation for the predictor of Xq based on the incomplete 
past: 

(3.20) X (S) = - E, £S (E ieMu{0} < EL V , *i- 

12 



We note that the prediction error here has the representation 

(3.21) ^-*o(S) = E 4eJftJ{0} ^(EL B ^ e * 

in terms of the dual Y in ()3.19[) . Furthermore, the sequence {Y% = j ^fc-j £ fc}j=-oo 
spans $p{Xf,j < n}, the infinite past up to n of the process {A t }. The formulas 
(l3~2"0)) and P^T|) were obtained initially by Bondon (2002, Theorem 3.2) without 
using the notion of duality. 

3.5. Series Representation of the Interpolators. Series representation for the 
interpolator of X based on the observed values from the index set S = 5^ \ M = 
Z\(M U {0}) was obtained by Salehi (1979). Here we obtain such representation 
using the idea of the dual process. Assuming / _1 e L 1 or X)jlol a il 2 < °°> the 
process 

. a k -jE kl j e Z, 

k=j 

is well-defined in the sense of mean-square convergence. From (|3.1[) . (|3.2[) . and the 
above results, we have the following: 

(i) (Xi,£j) = 5 i:j for i,j e Z. 

(ii) = {X, - X,(Z \ {j})}/||Jf£- X,-(Z \ {j})l| 2 for j G Z. 

(iii) j G Z} spans the space sp{X, ; j e Z}. 

(iv) {^j ; j £ Z} is a stationary process with the autocovariance function 



3 ■■= ^ I' e-^fiXj-'dX, j G Z, 



i-e-, = 7*~ J = Ejtlivj a, k -ia k -j for i, j e Z. 

The process has already appeared in prediction theory and time series analysis, 
and is called the standardized two-sided innovation (Masani (I960)) or the inverse 
process (Cleveland (1972)) of {X t }tez- 

Now, for solving the interpolation problem with S = Z \ (M U {0}), we need to 
show that {£,■; j 6 M U {0}} spans the orthogonal complement of sp{X,-; j e S} in 
sp{Xj]j e Z}. Then, it turns out that there is unique (ct'j)j<=Mu{o} satisfying 

X - X (S) = E ieMu{0} «& = E ieMU{0} ^ (EL 
(see fl3Jl and (j3~2Tj) ). and that a 2 (S) = a' . Since 

(X Q ,0) - E ieMu{0} «J&>&) = (*o(S),&) = 0, i G M U {0}, 
we can compute (c4)igMu{o} by solving the following system of linear equations: 

fj^i-^g jeMufO}. 



As for the predictor, if 53 J __ 00 7 3 A_j is mean-convergent, then (£j)jez admits the 
representation 

6 = E°° * e Z ' 

and we have 

X (S) = - Y (V arf-A Xj, 

which is the two-sided version of the formula (|3.20p . 
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4. Discussion and Future Work 



We have reviewed and unified some important results from prediction theory 
of stationary processes using a finite-dimensional duality principle whose proof is 
based on elementary ideas from the linear algebra. Our time-domain, geometric 
and finite-dimensional approach brings considerable clarity and simplicity to this 
area of prediction theory as compared to the classical spectral-domain approach 
based on analytic function theory and duality in the infinite-dimensional spaces. 
Since our duality lemma is not confined to stationary processes or Toeplitz matri- 
ces, it has the potential of being useful in solving similar prediction problems for 
nonstationary processes, particularly those with low displacement ranks (Kailath 
and Sayed (1995)). However, the present form of the lemma docs not seem to be 
useful for prediction problems of infinite-variance or L p -processes (Cambanis and 
Soltani (1984), Cheng et al. (1998)). 
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